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Abstract 

There are test-equating situations in which it may be appropriate to fit a loglinear or other type of 
probability model to the joint distribution of a total score on a test and a score on part of that test. 
For anchor test designs, this situation arises for internal anchor tests, which are embedded within 
the total test. Similarly, a part-whole relationship arises between two scores when a few test 
items are dropped from a test and a single group design is used to equate the scores of the full 
test to the part that remains after those items are deleted. In these part-whole situations, the 
resulting bivariate frequency distribution will exhibit structural zeros due to the fact that some 
scores on the total test are impossible for specific values on the partial test. Without knowing 
where the structural zeros are, it is impossible to distinguish them from zero frequencies that are 
simply due to size of the sample (i.e., sampling zeros). When probability models are estimated 
for these joint distributions, it is usual to require the models to assign positive probability to the 
sampling zeros but to avoid assigning positive probability to the structural zeros. To do this, it is 
important to be able to locate where the structural zeros are in the bivariate distribution. When 
the scores on the tests are consecutive integers, it is easy to detennine the location of the 
structural zeros. This report gives a solution to the problem of locating the structural zeros that 
arise for a class of bivariate distributions that includes both number-right scores and formula 
scores that have been rounded to integer values. The result for rounded formula scores is a 
simple alteration of the case where all the scores are consecutive integers. 

Key words: Test equating, probability models, discrete bivariate distributions, test scores 
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1. Introduction 


This report is concerned with the location of impossible combinations of scores in certain 
types of bivariate score distributions. These cases routinely arise for the joint distributions of a 
total score, X + A, and one of its part scores, A. The part-whole relationship between the two 
scores and the limited range of the possible scores for X and A combine to make some scores for 
X + A impossible when A is fixed at a specific value. These impossible combinations of scores 
are called structural zeros (SZs) (Bishop, Fienberg, & Holland, 1975) and should be taken into 
account when models are fit to the joint distribution of the total score and the part score. In 
particular, these models should not put positive probability in the locations that are SZs. To 
avoid this, one must be able to locate where the impossible combinations occur in the 
distribution. 

An important case of the part-whole relation arises with the nonequivalent groups with 
anchor test design when A denotes an internal anchor test raw-score and X + A denotes the total 
test raw-score. Another situation where a part-whole relationship arises occurs when A contains 
a small number of items that are being deleted from the test and the shorter score, X, is to be 
equated to the longer test, X + A, using the single group design. In both of these situations, it is 
common to consider presmoothing the joint distributions using loglinear models as, for example, 
von Davier, Holland, and Thayer (2004) recommended when using the kernel method of test 
equating. 

As discussed below, when X and A are number-right scored, it is easy to locate where the 
SZs are. However, in practice there may also be interest in the case where X + A and A are both 
formula scores that are rounded to integer values. This report gives a solution to this problem 
that includes rounded formula scores of a very general sort. 

If A and X are both number-right scores, then both A and X + A can take on only 
consecutive integer values, starting at 0 and increasing up to A max , for A, and X max + A max for X 
+ A, where A max and X max are maximum values of A and X, respectively. Integer scores that 
have a nonzero lower bound can arise in practice when, for example, some items are flawed and, 
rather than being eliminated from the test, are scored as correct regardless of the answer given. In 
this situation, the smallest possible value of X, X min is greater than 0. Another situation where 
the lowest score is not 0 arises for test scores that include the score on an essay that is, for 
example, graded from 1 to 6. 
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Hence, it may happen in practice that X and A can only take on consecutive integer 
values, but not necessarily starting at 0. In this more general case, A can take on values from 
A m i n to A max , and X + A can take on values from X mm + A m i n to X max + A max . 

In the case just described, of consecutive integer scores, the location of the SZs is fairly easy 
to identify. It is easy to see that if A = j, then every value between X mm + j and X max + j is a possible 
value of X + A, and no other values are. Any value of X + A outside this range is impossible when A 
= j, so that the combinations of A = j and X + A scores outside this range are the SZs. 

As j ranges from A mm to A max , the two-way array of frequencies for X + A by A has a 
parallelogram of possible combinations bounded by the cells indexed by 

(i + A min , A min ) for i = X min to X max (left-side boundary of the parallelogram), 

(i + A max , A max ) for i = X min to X max (right-side boundary of the parallelogram), 

(X min + j, j) for j = A rain to A max (top boundary of the parallelogram), 

and 

(X max + j, j) for j = A m i n to A max (bottom boundary of the parallelogram). 

For the case of consecutive integer values for both X and A, the diagram in Figure 1 is a 
schematic representation of the SZs and non-SZs in the joint distribution of X + A and A. 

Amin A Amax 

Xmin "t" Amin 

Xmax + A m in 

X + A 


Figure 1. The standard parallelogram of non-SZs when X and A have only consecutive 
integer possible scores. 
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The region described above and in Figure 1 is the standard parallelogram of non-SZs. In 
summary then, when the possible scores of X and A are both consecutive integers (including 
number-right scores), the standard parallelogram locates where the non-SZs are in the joint 
distribution of X + A and A, and all other locations in the joint distribution are the SZs. 

2. Rounding Scores 

In practice, more complicated cases than those discussed in section 1 may also arise. For 
example, both A and X need not have nonnegative integer-valued scores, but may be negative or 
have fractional parts. These types of scores arise when corrections for guessing are used, such as 
the well-known formula, R - (1/4)W, for scores from five-option multiple-choice (MC) tests. 
They can arise in other ways, such as theta estimates” from item response theory (IRT) models. 
This report does not make strong assumptions about the nature of these possibly negative and 
fractional scores at this. The primary motivation for this report is to extend the results described 
in Figure 1 to the case of rounded formula scores. 

When fractional scores are present, the scores of X + A and A are often rounded 
separately to integer values. I do not know how widespread this practice is, but it certainly arises 
atETS. 

To denote the rounding method focused on in this report, the following rounding 
function, [x], for any real number x, is defined by 

[x] = n if and only if n - 0.5 < x < n + 0.5, and n is an integer. (1) 

The rounding function in (1) corresponds to what is often called rounding up because 
numbers with fractional parts equal to 0.5 are rounded up to the nearest integer. This report uses 
this definition because it is the one traditionally used at ETS under the rubric of rounding in 
favor of the candidate. Of course, there are other ways to round numbers—such as rounding 
down or rounding even—but they play no role in this discussion. 

The remainder function, r(x) is defined as follows 

r(x) = x—[x], (2) 

so that for any real number, x, 

- 0.5 < r(x) < 0.5, (3) 
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and, 


x = [x] + r(x). (4) 

The two functions [x] and r(x) have four useful properties, summarized in Lemma 1. 

Lemma 1: For any integer n and real numbers x and y, 

(a) [n + x] = n + [x], 

(b) [r(x)] = 0, 

(c) if x < y then [x] < [y], 
and 


(d) [x + y] = [x] + [y] + [r(x) + r(y)]. 

Proof: The first three properties are obvious. Property (d) follows from property (a) and the fact 
that [x + y] = [[x] + r(x) + [y] + r(y)]. QED. 

The four properties in Lemma 1 simplify the discussion used in this report. 

3. Formula Scores 

This report uses the following framework for discussing formula scores that arise from 
corrections for guessing. A score, S, is a formula score here if it is of the form 

S = R 0 + {Ri - Wj} + {R 2 - (1/2)W 2 } + {R 3 - (1/3)W 3 } + .... (5) 

In (5), Rj denotes the number right and Wj the number wrong for items that are corrected 
for guessing by using a fonnula score of the form Rj - (l/j)Wj. Usually these are MC test items 
with j + 1 options or possible responses. 

In (5), the sequence continues so that it includes all of the types of formula scores that are 
used in the test. R 3 - (1/3)W 3 is the formula score for four-option test items, and the sequence 
does not need to stop there if there are other types of test items on the whole test with more 
answer options. Ro denotes a consecutive integer score for an item, such as an essay, which is 
graded from, say, 1 to 6, or with any other consecutive integer values, say u to v. Ro is also meant 
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to include the sum of such item scores, the only proviso being that the possible values of Ro are 
consecutive integers with no gaps. 

Thus, the term formula scores includes scores that are the sums of scores that are 
corrected for guessing in possibly more than one way—that is, tests that include both four- and 
five-choice items—as well as other consecutive integer scores that are added to the sums of 
formula scores. This generality is needed to include the many special cases that arise at ETS. 
Remarkably, such generality does not interfere with the analysis. 

One type of score that (5) does not cover arises when two different formula scores of the 
type described in (5) are each multiplied by a weight and then added together. Certain types of 
composite scores are like this, for example, R + 3.2T, where R is a number-right score for a set 
of multiple-choice questions and T is an essay score. These types of scores are not included in 
this analysis and must be dealt with in a different way. 

There are some useful properties of rounded formula scores of the form defined by (5). 

To study them, this report uses Lemma la to write [S] as 

[S] = Ro + Ri - Wi+ R 2 + R 3 +...+ [- (1/2)W 2 - (1/3)W 3 ...]. (6) 

From (6) observe that the fractional parts of S all stem from the items that are answered 
incorrectly. Lemma 2 states, without proof, two simple but important observations about rounded 
formula scores of the form given in (5). 

Lemma 2: If S is a formula score as defined in (5), then 

(a) every integer value from [S m i n ] to [S max ] can be achieved by an integer value of S, 

except possibly for [S m i n ], and, 

(b) if [S] = [SmaJ, then S = S max , an integer. 

In part (a) in Lemma 2, it is possible that only an S-value with a fractional part can round 
to [Smin]. As a simple example, if S is from a three-item test and R - (1/4)W is the rule is used for 
scoring, then S m i n = -3/4 and [S m i n ] = -1. In this case, no achievable integer value for S can round 
to -1. Part (b) of Lemma 2 means that the only way to get the highest score on a fonnula-scored 
test is to get every item correct, an integer score. Nothing else rounds to the top score. Section 4 
uses the two properties in Lemma 2 repeatedly. 
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4. Structural Zeros for Rounded Scores 

Consider the joint occurrence of [X + A] = i and [A] = j. Then, just as in the consecutive 
integer case discussed in section 1, it is easy to see that some combinations of i and j are 
impossible and, therefore, create SZs. 

For example, if X is a 10-item test, A is a 20-item test, and the scores are all of the form 
R — (1/4)W, then it is impossible for [A] = 20 and, at the same time, for [X + A] = 31 or higher 
or for [X + A] = 17 or less. [X + A] = 30 can only arise when X + A = 10 + 20, and [X + A] = 18 
arises when X = -10/4 = - 2.5 and X + A = - 2.5 + 20 = 17.5, which rounds to 18. Note that the 
only way for [A] = 20 is for A = A max = 20, as indicated in Lemma 2b. 

Now suppose in the general case of rounded formula scores that [A] = j. What are the 
possible values of [X + A]? The answer is, at most, a slight alteration to the standard 
parallelogram described in Figure 1, but to state the final result (which the end of this section 
does), it is necessary to examine several situations. Figure 2 gives the schematic representation 
of the areas of the joint distribution of [X + A] and [A], which arises in the rest of the discussion. 


[A] 


[A min], [A m in] + 1,... A m ax 



Figure 2. The standard parallelogram and other areas relevant to SZs and non-SZs for 
rounded formula scores. 
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Section 5 gives three simple examples that are intended to help the reader understand the 
more general derivations that follow it. 

In general, Lemma lc can be used to assert that [X + A] must be within these limits, 

[X min + A] < [X + A] < [X max + A] . (7) 

To compute various rounded sums of formula scores, this report makes repeated use of 
Lemma Id, that is, the formula 

[X + A] = [X] + [A] + [r(X) + r(A)]. (8) 

In (7), the quantity [X max + A] has an upper bound that is easy to establish when X is a 
formula score. The result is summarized in Theorem 1. 

Theorem 1: If X is a formula score and [A] = j, then 

[X + A] < X max + j. (9) 

Proof: From (7), examine [X max + A], But X max is an integer, and from Lemma la [X max + A] = 
X max + j. QED. 

Theorem 1 shows that if [A] = j for any j from [Amin] to [A max ], then any [X + A] -score 
higher than X max + j is impossible and produces an SZ. Thus, the lower triangle shown in Figure 
2 contains only SZs. 

Next, consider the insides of the standard parallelogram except for the left-side boundary. 
Theorem 2 gives the result. 

Theorem 2: If A is a formula score and [A] = j > [A min ] then every integer score from [X min ] + j 
to [X max ] + j can be achieved by [X + A]. 

Proof: If j > [Amin] then, from Lemma 2a, [A] = j can be achieved by an integer value for A. 
Hence, assume that r(A) = 0. But (8) then results in 

[X + A] = [X] + [A] + [r(X) + r(A)] = [X] + j + [r(X) + 0] 

= [X]+j + 0. 
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Now let [X] range from [X m j n ] to [X max ]. QED. 


Theorem 2 shows that if [A] = j > [A min ]. then no SZs can occur for [X + A] from [X mm ] 

+ j to [X max ] + j- This shows that, for all of the columns to the right of [A] = [A m i n ] in Figure 2, 
the standard parallelogram includes only non-SZs for the joint distribution of [X + A] and [A]. 

Next, consider the left-side boundary of the standard parallelogram in Figure 2. Theorem 
3 gives the result. 

Theorem 3: If X is a formula score and [A] = [Amin], then every integer score from [X m i n ] + 1+ 
[A min ] to [Xmax] + [Amin] can be achieved by [X + A]. 

Proof: Use (8) to show that 

[X + A] = [X] + [A] + [r(X) + r(A)] = [X] + [Amin] + [r(X) + r(A)] (10) 

If [X] > [Xmin] + 1, then from Lemma 2 [X] is achievable by taking X to be an integer so 
that r(X) = 0, and hence, for any [X] > [X m i n ] + 1,(10) becomes 

[X + A] = [X] + [Amin] + [0 + r(A)] = [X] + [A mm ]. (11) 

This shows that every integer score from [Xmin] + 1+ [A m j n ] to [X ma x] + [Amin] can be achieved by 
[X +A], QED. 

Theorem 3 shows that the left-side boundary of the standard parallelogram also contains 
non-SZs for the joint distribution of [X + A] and [A], except for possibly the cell defined by 
[X + A] = [Xmin] + [Amin] and [A] = [A m i n ]- Theorem 4, below, indicates that under certain 
conditions this cell can also be an SZ. For the rest of the discussion it is useful to define 8(A) by 

5(A) = [r(Xmin) + r(A)]. (12) 

From the inequality (3) it follows that for any A, 

- 1 < 8(A) < 1, (13) 

so that, from the definition of [x], 8(A) can only take on the values, - 1, 0, or 1. 
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Theorem 4 examines the case of the upper left-hand corner cell of the standard 
parallelogram. 

Theorem 4: If [A] = [A min ], then [X + A] > [X min ] + [A min ] if and only if 8(A min ) = 1. 

Proof: Clearly, [X + A] > [X min + A min ] = [X min ] + [A min ] + [r(X min ) + r(A min )], from which the 
result follows. QED. 

The combined effect of Theorems 1 to 4 is to show that, except for the upper left-hand 
cell of the standard parallelogram, the lower triangle and the standard parallelogram describe the 
SZs and non-SZs of rounded formula scores in exactly the same way as they do for consecutive 
integer scores. 

Now it is time to examine the areas of Figure 2 denoted as the super diagonal and the 
upper triangle. These areas are relevant to the smallest values that can be achieved for [X + A] 
for a given value of [A]. 

The lower bound that parallels Theorem 1 is weaker and there are various special cases to 
examine. A basic result is given in Theorem 5. 

Theorem 5: If [A] = j, then 


[X + A] > [X min ] - 1 + j. (14) 

Proof: From (7), examine [X mm + A]. Next, (8) shows that 

[X ra in + A] = [Xmin] + [A] + [r(X m i n ) + r(A)] = [Xmin] + j + 8(A). (15) 

Now apply (13) so that 8(A) > -1, from which (14) follows. QED. 

Theorem 5 shows that the upper triangle in Figure 2 only contains SZs for the joint 
distribution of [X + A] and [A]. To complete the picture, examine the super diagonal in Figure 2. 
Note that the super diagonal includes the cell or score combination defined by [A] = [Amin] and 
[X + A] = [Xmin] + [A m in] - 1, even though this cell was not included in Figure 1. For clarity, the 
super diagonal is defined by the following combinations of scores : 

[X + A] = [X m i n ] + j - 1 and [A] = j, for j = [A min ] to [A max ]. (16) 
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The inequality in Theorem 5 can be strengthened when A is a formula score and [A] = 
A max - This is done in Theorem 6. 

Theorem 6: If A is a formula score and [A] = A max , then 

[X + A] > [X min ] + A max . (17) 

Proof: From (7), examine [X mm + A] and then from (8) conclude that, 

[X ra in + A] = [Xmin] + [A] + [r(X m i n ) + r(A)] = [Xmin] + A max + 8(A). 

But from Lemma 2b, since A is a formula score and [A] = A max , then A = A max , an integer so that 
r(A) = 0 and hence 8(A) = [r(X m i n ) + 0] = 0, so that 

[X + A] > [X m in] + A max . QED. 

Theorem 6 shows that the right most cell of the super diagonal is always an SZ for 
rounded fonnula scores. The cells of the rest of the diagonal may or may not be SZs. The proof 
of Theorem 6 shows that 8(A max ) = 0. 

Revisiting the proof of Theorem 5, one sees that the lower bound in (14) may be analyzed 
more carefully. Equation (15) may be written as 

[Xmin + A] = [Xmin] + [A] + 8(A). (18) 

Equation (18) is the lower bound for [X + A] when [A] is a given value. When 8(A) = - 
1, then the lower bound is in the super diagonal. In that case, the corresponding cell in the super 
diagonal is a non-SZ. If 8(A) = 0, then the lower bound is below the super diagonal, on the upper 
boundary of the standard parallelogram, and the corresponding cell above it, in the super 
diagonal, is an SZ. Finally, Theorem 2 shows that the case of 8(A) = 1 can be ignored because 
Theorem 2 shows that cells below the upper boundary of the standard parallelogram are never 
SZs for formula scores, except for the case described in Theorem 4. 

Before addressing the cells of the super diagonal in more detail, let me dispose of the 
leftmost cell of the super diagonal. Theorem 7 is similar to Theorem 4 in that it concerns a single 
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cell of Figure 2, the one in the super diagonal just above the upper left-hand cell of the standard 
parallelogram. 

Theorem 7. If [A] [A m j n ], then [X m m A m j n ] [X m m] [A m m] — 1 if and only if 

8(A min ) = - 1. 

Proof: From (8) and the definition of 8(A), 

[X m in "I" A m i n ] [Xmin] + [A m in] §( A nlm ), 
from which the result follows. QED. 

Theorem 7 shows that the leftmost cell of the super diagonal can be achieved by [X + A] 
if and only if 8(A m ; n ) = - 1. Theorem 4 implies that neither that cell nor the one below it can be 
achieved by [X + A] if 6tA mm ) = 1. 

The cells of the super diagonal can be achieved by [X + A] and [A] if and only if a value 
of A exists such that [A] = j and [X + A] = [X min ] + j - 1. This can never happen for j = A max for 
formula scores, and Theorem 7 gives the only condition when it can happen for j = [A min ] for 
formula scores. So now suppose that A is a formula score and that 

[Amin] < j < A max . (19) 

The problem of the super diagonal reduces to the following question. If j satisfies (19), 
when does a value of A exist such that [A] = j and 8(A) = - 1? If A can be chosen so that [A] = j 
and 8(A) = — 1, then the corresponding score on the super diagonal can be achieved and is a non- 
SZ. If 8(A) > - 1 for any A such that [A] = j, then the corresponding score on the super diagonal 
cannot be achieved and is an SZ. 

Theorem 8 gives the condition under which 8(A) = - 1. 

Theorem 8: 8( A) = - 1 if and only if—1 < r(X m i n ) + r(A) < - 0.5. 

Proof: By definition, 8(A) = [r(X m i n ) + r(A)]. Clearly, r(X min ) + r(A) can round to - 1 if and only 
if- 1 < r(X m i n ) + r(A) < - 0.5. QED. 

It is easy to show that the condition, -1 < r(X mm ) + r(A) < - 0.5, in Theorem 8, and the 
inequality (3) combine to imply that both r(X m in) and r(A) must be strictly less than 0. Theorem 9 
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finishes off the remaining cells of the super diagonal. It shows that either all of the cells are 
possible score combinations, except for the rightmost cell, or none of them are. 

Theorem 9: If A is a formula score with [A] = j < A max and S(A m i n ) = - 1, then [X + A] can 
achieve any score of the form [X min ] + j - 1. 

Proof: From Theorem 8, 8(A m i n ) = - 1 if and only if—1 < r(X m ; n ) + r(A mm ) < - 0.5, so that r(A m i n ) 
< 0. Hence, 

Amin < [Amin]. (20) 

Now suppose A is such that [A] = j < A max . Is it possible to find an A such that 
r(X mm ) + r(A) < - 0.5, as is required in Theorem 8? Clearly by hypothesis A m i n satisfies this for 
[A] = [A m i n ]. However, a corresponding value of A for [A] = [A m i n ] + 1 can be found by reducing 
the number of wrong responses appropriately and replacing them with omitted or correct 
responses, thus increasing A m i n by 1 point. This approach steps through all of the values of 
[A] =j < A max . QED. 

Together, Theorems 7 and 9 imply that if S(A m i n ) = - 1, then all of the super diagonal in 
Figure 2 is achievable by [X + A] and [A] except for the rightmost cell; that, by Theorem 6, is 
always an SZ. If 8(A m ; n ) > - 1 then the super diagonal only contains SZs. 

The Final Result for Rounded Formula Scores 
The three possibilities are as follows: 

1. If 8(A m i n ) = 0, then the standard parallelogram describes all of the non-SZs for the 
joint distribution and all of the other cells are SZs. 

2. If 8(A m in) = 1, then the standard parallelogram describes all of the non-SZs for the 
joint distribution except for the upper leftmost cell, which is also an SZ. The 
remaining cells are all SZs. 

3. If 8(A m in) = — 1, then the standard parallelogram plus the super diagonal minus its 
rightmost cell describe the non-SZs for the distribution. All the other cells are SZs. 

The examples of section 5 illustrate these three possibilities. 
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From this discussion, the value of 5(A nim ) determines where the SZs and non-SZs of the 
joint distribution of [X + A] and [A] are located. Clearly, the standard parallelogram plays a 
major role, as does the super diagonal that lies directly above it. Computer programs that are 
used to locate the SZs for such joint distributions need to know the value of 

5( A min ) = [r(X min ) + r(A min )]. (21) 

Whether this should be done by having the value of [r(X m i n ) + r(A m ; n )] supplied by the 
user or by the program computing it from other information the user provides is a judgment that 
programmers will need to make. 

5. Three Simple Examples of Rounded Formula Scores 
Example 1: (An example of S(A m i n ) = 1.) X and A both have three five-option items with R - 
(1/4)W being the formula score: 

X min = - 3/4, [X min ] = - 1, r(X min ) = 1/4 > 0, 


A min = - 3/4, [A min ] = - 1, r(A min ) = 1/4 > 0, 


S(A min ) = [1/4 + 1/4] = [0.5] = 1. 

In this example, [X min + A m i n ] = [-6/4] = - 1, while [X mm ] + [A m i n ] = - 2, so that it is 
impossible for [A] = [A m ; n ] = - 1 and [X + A] = [Xmin] + [Ami n ] = - 2. If 0 denotes an SZ and 1 a 
non-SZ, then Table 1 corresponds to Figure 2 for this example. To that end, the standard 
parallelogram and the extra SZ within it is shaded differently. Note that this example can be 
generalized to any X and A with 4n + 3 and 4m + 3 five-option MC items, respectively. 
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Table 1 

The Location of SZs for Example 1 


rx+A] 

-1 

0 

[A] 

1 

2 

3 

-2 

0 

0 

0 

0 

0 

-1 

1 

1 

0 

0 

0 

0 

1 

1 

1 

0 

0 

1 

1 

1 

1 

1 

0 

2 

1 

1 

1 

1 

1 

3 

0 

1 

1 

1 

1 

4 

0 

0 

1 

1 

1 

5 

0 

0 

0 

1 

1 

6 

0 

0 

0 

0 

1 


Example 2: (An example of 8(A m ; n ) = - 1.) X and A both have two five-option items with 
R - (1/4)W being the fonnula score: 

X m in = — 2/4, [X m in] = 0, r(X m i n ) = — 0.5 < 0, 

A min = - 2/4, [A min ] = o, r(A min ) = - 0.5 < 0, 

8(A min ) = [- 0.5 - 0.5] = [— 1 ] = — 1. 

In this example, [X min + A min ] = [- 1] = - 1, while [X min ] + [A min ] = 0, so that it is 
possible for [A] = 0 and [X ra i n + A m i n ] = — 1. Table 2 corresponds to Figure 2 for this example. 
To that end, the super diagonal and the standard parallelogram are shaded differently. Note that 
this example can be generalized to any X and A with 4n + 2 and 4m + 2 five-option MC items, 
respectively. 


Table 2 

The Location of SZs for Example 2 


rx+A] 

0 

[A] 

1 

2 

-i 

1 

0 

0 

0 

1 

1 

0 

1 

1 

1 

0 

2 

1 

1 

1 

3 

0 

1 

1 
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4 _0_0_ 1 

Example 3: (An example of 5(A min ) = 0.) X has two five-option items and A has three five- 
option items with R - (1/4)W being the formula score: 

X m ; n = — 2/4, [X m i n ] = 0, r(X m j n ) = — 0.5 < 0, 


A min = - 3/4, [A min ] = “ 1, r(A m in) = 1/4 > 0, 


6(Amin ) = [- 0.5 + 0.25] = [- 0.25] = 0. 


In this example [X min + A m i n ] = [- 5/4] = - 1, and [X min ] + [Amin] = 0 - 1 = - 1, so that it 
is possible for [A] = - 1 and [X m i n + Amin] = - 1, as well. Table 3 corresponds to Figure 2 for this 
example, the standard parallelogram is shaded. Note that this example can be generalized to any 
X with 4n + 2 and A with 4m + 3 five-option MC items. 


Table 3 

The Location of SZs for Example 3 


rx+A] 

-1 

0 

[A] 

1 

2 

3 

-i 

1 

0 

0 

0 

0 

0 

1 

1 

0 

0 

0 

1 

1 

1 

1 

0 

0 

2 

0 

1 

1 

1 

0 

3 

0 

0 

1 

1 

1 

4 

0 

0 

0 

1 

1 

5 

0 

0 

0 

0 

1 


6. Implications for Degrees of Freedom for Chi-Square Tests 

In fitting loglinear models to bivariate score distributions as proposed, for example in 
Holland and Thayer (2000), the issue of the effect of SZs on the distributions of chi-square 
statistics naturally arises. The effect on the nominal degrees of freedom is easy to describe. In 
general, the effect of SZs is to reduce the nominal degrees of freedom by the number of SZs. For 
the cases that interest us here, this can be quite a substantial reduction. 

As an example, suppose the scores are all number-right and that there are J items in X 
and L items in A. Thus, the cross tabulation of X + A with A has a total of (J + L + 1)(L +1) 
cells in it. The upper triangle in Figure 1 then contains 


15 



L + (L - 1) + ( L - 2) + . . . + l=Vi L(L + 1) 


cells, and the lower triangle contains 
l+2 + ... + L = 1 /2 L(L +1) 

cells as well. Hence, the total number of SZs is L(L +1). Reducing (J + L + 1)(L +1) cells by 
L(L + 1) is easily shown to yield (J + 1)(L + 1), which is just the number of possible score 
combinations for (X, A). The nominal degrees of freedom are 

Nominal DF = (J + 1)(L + 1) - 1 - s, (22) 


where s is the number of free parameters in the estimated loglinear model. 
When the scores are rounded formula scores J + 1 is replaced by 

X max [X min ] T 1 

and L + 1 by 


A max | A rKn ] + 1. 


In this case the analogue of (J + 1)(L +1) — 1—sis 


SP DF - (X max - [X m i n ] + 1)( A max - [A m i n ] + 1) - 1 - S. 


(23) 


However, in this case the value of 8(A m i n ) needs to be taken into account as well. There 
may be additional or fewer SZs, as indicated in the final result of section 3. 

When 8(A m i n ) = 0, there are no additional SZs, and SP DF in (23) is the nominal degrees 
of freedom. 

When 8(A m i n ) = 1 , there is one additional SZ so that the nominal degrees of freedom are 
reduced by one, SP DF - 1. 

Finally, when 5(A min ) = -1, there are an additional A max - [A mm ] non-SZs on the super 
diagonal of Figure 2 so that the nominal degrees of freedom are then increased to 

SP DF + A max - [ A m i n ]. 

While it is fairly easy to describe how the nominal degrees of freedom change as a result 
of the SZs in the cases described in this report, in real applications the nominal degrees of 
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freedom are, themselves, of limited use. This is due to the large degree of sparseness that obtains 
in most observed bivariate score distributions. This sparseness of data usually invalidates a direct 
interpretation of the nominal degrees of freedom as the degrees of freedom applicable to chi- 
square tests. However, differences in the nominal degrees of freedom for two-nested loglinear 
models are often valid, as the degrees of freedom applicable to the difference in the 
corresponding likelihood ratio chi-square tests for the two nested models. What really matters in 
these tests of nested models is the difference in their number of parameters, that is, the s in (22) 
and (23), which does not require a user to know the nominal degrees of freedom in (22) or (23). 

7. Summary 

SZs arise in the bivariate distribution of a total score and a part score of the total test. 
When fitting models to such bivariate distributions, as is routinely done in various equating 
situations, the models should not put positive probability into any SZ. Therefore, it is important 
to be able to locate where the SZs are because the data can not distinguish between an SZ and a 
zero frequency that is due to sampling variability and low population proportions. In the case of 
number-right scoring, it is fairly easy to locate the SZs.. Example 3 illustrates this case. In the 
case where the scores for both the total test and the part test are rounded formula scores, there are 
three cases that can arise. The quantity 5(A mi n ), defined in (21) can be used to identify which 
case arises in a particular situation. These three cases are illustrated in Examples 1, 2, and 3. 
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